206 research outputs found

    Fast missing value imputation using ensemble of SOMs

    Get PDF
    This report presents a methodology for missing value imputation. The methodology is based on an ensemble of Self-Organizing Maps (SOM), which is weighted using Nonnegative Least Squares algorithm. Instead of a need for lengthy validation procedure as when using single SOMs, the ensemble proceeds straight into final model building. Therefore, the methodology has very low computational time while retaining the accuracy. The performance is compared to other state-of-the-art methodologies using two real world databases from different fields

    Autoregressive time series prediction by means of fuzzy inference systems using nonparametric residual variance estimation

    Get PDF
    We propose an automatic methodology framework for short- and long-term prediction of time series by means of fuzzy inference systems. In this methodology, fuzzy techniques and statistical techniques for nonparametric residual variance estimation are combined in order to build autoregressive predictive models implemented as fuzzy inference systems. Nonparametric residual variance estimation plays a key role in driving the identification and learning procedures. Concrete criteria and procedures within the proposed methodology framework are applied to a number of time series prediction problems. The learn from examples method introduced by Wang and Mendel (W&M) is used for identification. The Levenberg–Marquardt (L–M) optimization method is then applied for tuning. The W&M method produces compact and potentially accurate inference systems when applied after a proper variable selection stage. The L–M method yields the best compromise between accuracy and interpretability of results, among a set of alternatives. Delta test based residual variance estimations are used in order to select the best subset of inputs to the fuzzy inference systems as well as the number of linguistic labels for the inputs. Experiments on a diverse set of time series prediction benchmarks are compared against least-squares support vector machines (LS-SVM), optimally pruned extreme learning machine (OP-ELM), and k-NN based autoregressors. The advantages of the proposed methodology are shown in terms of linguistic interpretability, generalization capability and computational cost. Furthermore, fuzzy models are shown to be consistently more accurate for prediction in the case of time series coming from real-world applications.Ministerio de Ciencia e Innovación TEC2008-04920Junta de Andalucía P08-TIC-03674, IAC07-I-0205:33080, IAC08-II-3347:5626

    Using multiple re-embeddings for quantitative steganalysis and image reliability estimation

    Get PDF
    The quantitative steganalysis problem aims at estimating the amount of payload embedded inside a document. In this paper, JPEG images are considered, and by the use of a re-embedding based methodology, it is possible to estimate the number of original embedding changes performed on the image by a stego source and to slightly improve the estimation regarding classical quantitative steganalysis methods. The major advance of this methodology is that it also enables to obtain a confidence interval on this estimated payload. This confidence interval then permits to evaluate the difficulty of an image, in terms of steganalysis by estimating the reliability of the output. The regression technique comes from the OP-ELM and the reliability is estimated using linear approximation. The methodology is applied with a publicly available stego algorithm, regression model and database of images. The methodology is generic and can be used for any quantitative steganalysis problem of this class

    Residual variance estimation using a nearest neighbor statistic

    Get PDF
    AbstractIn this paper we consider the problem of estimating E[(Y−E[Y∣X])2] based on a finite sample of independent, but not necessarily identically distributed, random variables (Xi,Yi)i=1M. We analyze the theoretical properties of a recently developed estimator. It is shown that the estimator has many theoretically interesting properties, while the practical implementation is simple

    Mutual Information Based Initialization of Forward-Backward Search for Feature Selection in Regression Problems

    Get PDF
    Pure feature selection, where variables are chosen or not to be in the training data set, still remains as an unsolved problem, especially when the dimensionality is high. Recently, the Forward-Backward Search algorithm using the Delta Test to evaluate a possible solution was presented, showing a good performance. However, due to the locality of the search procedure, the initial starting point of the search becomes crucial in order to obtain good results. This paper presents new heuristics to find a more adequate starting point that could lead to a better solution. The heuristic is based on the sorting of the variables using the Mutual Information criterion, and then performing parallel local searches. These local searches provide an initial starting point for the actual parallel Forward-Backward algorithm

    A boundary corrected expansion of the moments of nearest neighbor distributions

    Get PDF
    In this paper, the moments of nearest neighbor distance distributions are examined. While the asymptotic form of such moments is well-known, the boundary effect has this far resisted a rigorous analysis. Our goal is to develop a new technique that allows a closed-form high order expansion, where the boundaries are taken into account up to the first order. The resulting theoretical predictions are tested via simulations and found to be much more accurate than the first order approximation obtained by neglecting the boundaries. While our results are of theoretical interest, they definitely also have important applications in statistics and physics. As a concrete example, we mention estimating Renyi entropies of probability distributions. Moreover, the algebraic technique developed may turn out to be useful in other, related problems including estimation of the Shannon differential entropy

    optimal pruned K-nearest neighbors: op-knn application to financial modeling

    Get PDF
    The paper proposes a methodology called OP-KNN, which builds a one hidden- layer feedforward neural network, using nearest neighbors neurons with extremely small com- putational time. The main strategy is to select the most relevant variables beforehand, then to build the model using KNN kernels. Multiresponse Sparse Regression (MRSR) is used as the second step in order to rank each kth nearest neighbor and finally as a third step Leave-One- Out estimation is used to select the number of neighbors and to estimate the generalization performances. This new methodology is tested on a toy example and is applied to financial modeling

    RMSE-ELM: Recursive Model based Selective Ensemble of Extreme Learning Machines for Robustness Improvement

    Get PDF
    Extreme learning machine (ELM) as an emerging branch of shallow networks has shown its excellent generalization and fast learning speed. However, for blended data, the robustness of ELM is weak because its weights and biases of hidden nodes are set randomly. Moreover, the noisy data exert a negative effect. To solve this problem, a new framework called RMSE-ELM is proposed in this paper. It is a two-layer recursive model. In the first layer, the framework trains lots of ELMs in different groups concurrently, then employs selective ensemble to pick out an optimal set of ELMs in each group, which can be merged into a large group of ELMs called candidate pool. In the second layer, selective ensemble is recursively used on candidate pool to acquire the final ensemble. In the experiments, we apply UCI blended datasets to confirm the robustness of our new approach in two key aspects (mean square error and standard deviation). The space complexity of our method is increased to some degree, but the results have shown that RMSE-ELM significantly improves robustness with slightly computational time compared with representative methods (ELM, OP-ELM, GASEN-ELM, GASEN-BP and E-GASEN). It becomes a potential framework to solve robustness issue of ELM for high-dimensional blended data in the future.Comment: Accepted for publication in Mathematical Problems in Engineering, 09/22/201
    corecore